Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Prevent error page recursion #35209

Merged
merged 1 commit into from
Jul 31, 2024
Merged

Conversation

timmc-edx
Copy link
Contributor

@timmc-edx timmc-edx commented Jul 31, 2024

We sometimes see rendering errors in the error page itself, which then cause another attempt at rendering the error page. I'm not sure exactly how the loop is occurring, but it looks something like this:

  1. An error is raised in a view or middleware and is not caught by application code
  2. Django catches the error and calls the registered uncaught error handler
  3. Our handler tries to render an error page
  4. The rendering code raises an error
  5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in a hardcoded string, we can reduce server resources, avoid logging massive sequences of recursive stack traces, and still give the user some indication that yes, there was a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation error. There's a separate issue for restoring translation quality so that we avoid those issues in the future (openedx/openedx-translations#549) but in general we should catch all rendering errors, including unknown ones.

Testing:

  • In lms/envs/devstack.py change DEBUG to False to ensure that the usual error page is displayed (rather than the debug error page).
  • Add line 1/0 to the top of the student_dashboard function in common/djangoapps/student/views/dashboard.py to make that view error.
  • In lms/templates/static_templates/server-error.html replace static.get_platform_name() with None * 7 to make the error template itself produce an error.
  • Visit http://localhost:18000/dashboard.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k line set of stack traces and the page displays "A server error occurred. Please contact the administrator."

With the fix, the response takes less than a second and produces three stack traces (one of which contains the error page's rendering error).

We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

Without the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
@timmc-edx timmc-edx force-pushed the timmc/error-page-render-recurse branch from 035677c to 012a3d9 Compare July 31, 2024 17:12
@timmc-edx timmc-edx merged commit 0359d52 into master Jul 31, 2024
49 checks passed
@timmc-edx timmc-edx deleted the timmc/error-page-render-recurse branch July 31, 2024 17:36
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@feanil
Copy link
Contributor

feanil commented Aug 8, 2024

@cmltaWt0 This should be backported to Redwood if we can.

cmltaWt0 pushed a commit to raccoongang/edx-platform that referenced this pull request Aug 8, 2024
We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address openedx#35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

With the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
cmltaWt0 added a commit that referenced this pull request Aug 8, 2024
…nder-recurse

[Backport] fix: Prevent error page recursion (#35209)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants